Algorithm of text automatic categorization based on cascade support vector machine 的支持向量机组合分类器
An automatic categorization method of chinese web page, which has practical signification, is achieved by using corpus training result and vsm model 利用语料训练的结果并运用vsm模型,实现了一种有实践意义的中文网页自动分类方法。
Chinese web page automatic categorization contest was hold in national symposium on search engine and web mining and ten teams took part in this contest 摘要在最近召开的“全国搜索引擎与网上信息挖掘学术研讨会”上,举办了一场“中文网页自动分类竞赛”,共有来自全国各地的10个队参加。
A chinese web page automatic categorization contest was hold in national symposium on search engine and web mining and ten teams took part in this contest 摘要在最近召开的“全国搜索引擎与网上信息挖掘学术研讨会”上,举办了一场“中文网页自动分类竞赛”,共有来自全国各地的10个队参加。
Search engine is a capital tool of internet information retrieval . automatic categorization of chinese web page is an important study direction in the implementation of chinese search engine 搜索引擎是网络信息检索的重要工具,在中文搜索引擎的实现中,中文网页的自动分类是一个很重要的研究方向。
Based on introducing the calculation model of the automatic categorization for the patents, this paper sums up the functions of the metadata on the automatic categorization for the patents, and analyzes on the influence of the metadata on the accuracy of the automatic categorization for the patents 在介绍专利自动分类的计算模型基础上,总结了元数据对专利自动分类的作用,分析了元数据对专利自动分类准确度的影响。
Based on introducing the calculation model of the automatic categorization for the patents, this paper sums up the functions of the metadata on the automatic categorization for the patents, and analyzes on the influence of the metadata on the accuracy of the automatic categorization for the patents 在介绍专利自动分类的计算模型基础上,总结了元数据对专利自动分类的作用,分析了元数据对专利自动分类准确度的影响。
Based on introducing the calculation model of the automatic categorization for the patents, this paper sums up the functions of the metadata on the automatic categorization for the patents, and analyzes on the influence of the metadata on the accuracy of the automatic categorization for the patents 在介绍专利自动分类的计算模型基础上,总结了元数据对专利自动分类的作用,分析了元数据对专利自动分类准确度的影响。
By utilizing the idf ( inverse document frequency ) formula in automatic categorization process, which was used in information retrieval field to calculate the relativity term weight between keywords and relevant documents, and combining with analysis result of chinese web page, the formula carrying adjustable parameter for calculating the correlative degree is obtained . categorization correlative degree vector library, which is used to conserve categorization-training result, is designed and established to meet demands of the formula 并将信息检索领域中用于计算关键字与相关文献相关权重的idf(inversedocumentfrequency)公式应用于自动分类过程,结合对中文网页的分析结果,得出具有可调参数的权重计算公式,根据公式要求,设计并建立了用于保存分类训练结果的分类权重向量库。
This paper analyzes structure components on the web page contributing to categorization process and, aiming at characteristics of chinese web page and requirement of participle quality in web page analysis process, accordingly simplifies and adjusts the in being algorithm about longer / longest participle, thereby it further applies in automatic categorization process 本文分析了网页中对分类过程有贡献的结构成分,并针对中文网页的特点和网页分析过程中的对分词质量的要求,对现有的最长次长分词算法进行了相应的简化和调整,使其更加适用与自动分类过程。